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(54) Video comfmsslon method and system 

(57) A video compression method and system 
including object-oriented compression plus error cor- 
rection using decoder feec&ack. More particulariy error 
con-ecting apparatus cornprising a first decode having a 
deinterleaver coupled to an output thereof. A second 
decoder coupled to the output of the detnterieaver and a 
memory coupled to the first decoder. A feedback device 
is coupled to the second decoder and the memory, and 
includes an output connected to the deinterieaver. The 
second decoder Is c^>able of con-ecting errors in a 
codeword and con-ecting errors of related codewords in 
the memory. Constantly errors in the deinterleaver for 
codewords for the second decoder are connected. 
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Description 

BACKGROUND OF THE INVEhfTtON 

The invention r^ates to electronic video methods s 
and devices, and, more particularly, to digital cormnuni- 
cation and storage systems with compressed >mleo. 

Video communication (teleNnsion, teleconf^encing. 
and so forth) typically transmits a stream of video 
frames (images) along with aiKlio over a transmission 
channel for real time viewing and listening by a receiver. 
However, transmission channels frequently add corrupt- 
ing ncnse and have limited l>andwidth (e.g.. television 
channels limited to 6 MHz). Consequently, digital video 
transmission with compression enjoys widespread usa 
In particular, various standards for conpression of dig- 
ital video have emerged and include H.261. MPEG-1, 
and MPEQ-2, with more to follow, including in develop- 
ment H^63 and MPEG-4. There are similar audio com- . 
pression m^hods such as CELP artd MELP. 

Tekalp, Digital Video Processing (Prentice Hall 
1995), Clarke, Digital Compression of Still Images and 
Video (Academic Press 199^, and Schafer etal. Digital 
Video Coding Starxiards and Their Role in Video Com- 
munications. 83 Proc. IEEE 907 (1995). include sunv 
nnaries of various compression methods, including 
descriptions of the H^61. MPEQ-1. and MPEG-2 
starx^rds plus the H.263 recommendations and indica- 
tions of the desired functionalities of MPEO-4. 

H.261 compression uses interfrante prediction to 
reduce temporal redundancy and discrete cosine trans- 
form (DCT) on a iDlock level together with high spatial 
frequency cutoff to reduce spatial redundancy H.261 is 
recommended for use with transmissfon rates in multi- 
ples of 64 Kbps (kilotxts per second) to 2 Mbps (mega- 
bits per second). 

The H.263 recommendation is analogous to H.261 
but for bitrates of about 22 Kbps (twisted pair telephone 
wire compatible) and vinth nratton estimation at half-pixel 
accuracy (which elinrrinates the need for kxp filtering 
avaaable in H.261) and overiapped nrxilion compensa- 
tion to obtain a denser motion field (set of motfon vec- 
tors) at the expense of more computation and adaptive 
switching between motion compensation with 16 t>y 16 
macroWockand 8 t>y 8 Wocte. 

fi4PEQ-1 and MPEQ-2 also use temporal predkrtion 
followed by two dimenaonal DCT transtornmtion on a 
block level as H261 , but they make further use of vari- 
ous combinations of nx^tfon-compensated predction, 
interpolatton, and intraframe coding. MPEQ-1 aims at 
video CDs and works well at rates of about 1-1.5 Mbps 
for frames of about 360 pixels by 240 lines and 24-30 
franrms per second. MPEG-1 defines I. R and B frames 
with I frames intraframe. P frann^ coded ising motion- 
compensation prediction from previous I or P frames, 
and B frames using nrx)tiorHX)mpensated bi-directional 
predictfon/interpolation from adiac«Tt I and P frames. 

MPEG-2 aims at (£gttal television (720 pixels by 
480 tines) and uses bit-rates up to about 10 Mbps with 



MPEG-1 type motion compensatfon with I. P, and B 
frames pks adds scalabDrty (a lower bitrate may be 
extracted to transmit a lower rcesolution image). 

However, the foregoing MPEG compres^on meth- 
octe result in a nunf^ of unacceptable artifocts such as 
bbckiness and unnatural object nrx>tion when operated 
at very-fow-bit-rates. Because these techivques use 
ordy the statistical dependencies in the signal at a block 
levd and do not consider the semantic content of the 
vkleo stream, artifacts are introduced at the brfock 
boundaries under very-low-t>(t-rates (high quantization 
factors). Usually these bfock boundaries do not con-e- 
spond to physical boundaries of the moving objects and 
hence \nsually annc^ng artifects resuft 

Unnatural motion arises when the limited band- 
width forces the frame rate to fall below that required for 
smooth motion. 

MPEG4 is to apply to trarentission bit-rates of 10 
Kbps to t Mtips arxl is to use a content-t>ased coding 
approach with functfonalities such as scalat»lity, con- 
t6nt-k>ased nrmnipulations. robustness in error prone 
environmOTts, mi^timecGa data access tools, improved 
coding efficiency, ability to ^code both graphics and 
video, and Improved rarxtom access. A video coding 
scheme is ccmsidered content scalable if the number 
anti/or quality of simultaneous objects coded can t>e 
varied. Object scalability refers to controlling the 
number of simuHaneoi^ objects coded and quality seal* 
abirity refers to controlling the spatial and/or temporal 
reso^ons of the coded objects. Scalability is an innpor- 
tant feature for video coding methods operating across 
transnvssion channel of limited bandwidth and also 
charviete wh^e the bandwidth is dynamic For exam- 
ple, a content-scalable video coder has the ability to 
optimize the performance in the face of linvted t>arxi- 
width by encocfing arxJ tran^nitting only the important 
ejects in the scene at a high quality. It can then choose 
to either drop the remaining objects or code them at a 
much tewer quality. When the bandwidth of the channel 
increases, the coder can then transmit additional bits to 
improve the quality of the poorly coded objects or 
restore the missing objects. 

Mu»nann et al. Object-Oriented Analysis-Synthe- 
sis Coding of Moving Images, 1 Sig. Proc.: Image 
Comm. 117 (1989), niustrates hienarchical nrxjving 
object detection using source nxxiete. Tekalp. chapters 
23-24 also discusses object-based coding. 

Medioni et al, Comer Detectcn and Curvature 
resentation Using C%A)K B-Splines, 39 
Comp.Vis.Grph. Image Processing. 267 (1987). shows 
encoding of curves with B-SpSnes. Similarly. Foley et al. 
Computer Graphics (Addison-Wesley 2d Ed.), pages 
491-495 and 504-507. discusses cubic B-spllnes and 
Catmull-Rom spSnes (whfoh are constrained to pass 
through the control points). 

In order to achieve efficient transmissfon of videa a 
system must utTize compression schemes that are 
barxfwkfth efficient The conpressed vkl^ data is then 
transmitted over convnunication channels which are 
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prone to errors. For video coding schemes wNch exploit 
temporal correlation in the video data, channel errors 
result in the decoder losing synchronization with the 
encoder. Unless suitably dealt with, this can result tn 
noticeable degradation of the picture quality. To main- 5 
tain satisfactory video quality or quality of service, it is 
desirable to use schemes to protect the data from these 
channel errors. However, error protection schemes 
come with the price of an increased bit-rate. Moreover, 
it is not pcssble to correct all possit>le errors using a 10 
given error-control code. Hence, it becomes necessary 
to resort to some other techniques in addition to error 
control to effectively remove armoying and visually dis- 
turbing artifacts introduced by these channel induced 
errors. /5 

In fact, a typical channel, such as a virireless charv 
nel, over which compressed vkleo is transntitted is char- 
acterized by hig^ rarKiom bit enor rates (BER) and 
multiple tnirst errors. The random bit errors occur with a 
probability of around 0.001 arKf the burst OTors have a 20 
duration that (dually lasts up to 24 mQliseconds (msec). 

Error con-ecting codes such as the Reed-Solomon 
(RS) codes correct random errors up to a designed 
number per block of code symbols. Problems arise 
when codes are used over channels prone to burst 2S 
errors because the errors tend to be di^ered in a small 
number of received symbols. The commercial digital 
music compact disc (CD) uses interleaved codewords 
so that channel bursts may be spread out over multiple 
codewords upon decoding. In particular, the CD error 30 
contml encoder uses two shortened RS codes with 8-brt 
symbols from the code alphabet QF(256). Thus 1643it 
sound sample each take two information symbols. 
Rrst the samples are encoded twelve at a time (thus 24 
symbols) tjy a (28.24) RS code, then the 28-symbol 35 
codewmls pass a 28-branch interleaver with delay 
increments of 28-symbols tDetween branches. Thus 28 
successive 28-symbol codewords are interleaved sym- 
bol by symbol. After the interleaving, the 28-symbol 
blocks are encoded with a (32.28) RS coder to output 40 
32-symt)ol codewords for transntission. The decoder is 
a mirror image: a (32,28) RS decoder. 28-branch dein- 
terleaver with delay increment 4 symbols, and a (28,24) 
RS decoder. The (32.28) RS decoder can correct 1 
error in an input 32-symbol codeword and can output 28 4S 
erased symbols for two or more errors in the 32-symlx)l 
input codeword. The d^nterleaver then spreads these 
erased symbols over 28 codewords. The (28,24) RS 
decoder is set to detect up to and inducGng 4 symbol 
enors which are then r^laced wHh erased symbols in so 
the 24-symbol output words; for 5 or more errors, all 24 
syntols are erased. This corresponds to erased music 
samples. The decoder may interpolate the erased 
music samples with acQacent samples^ Generally, see 
Wickes, Error Control Systems tor Digital Communica- 55 
tion and Storage (Prentice Hall 1995). 

Ih&e are several hardware and software imple- 
mentations of the H.261. MPEG-1. and MPEG-2 com- 
pression and decorrpression. The hardware can be 
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single or nrultichip integrated circuit implementations 
(see Ti^p pages 455-456) or general purpose proces- 
sors such as the Ultrasparc or TMS320C80 running 
appropriate software. Public dcnnain software is availa- 
ble from the Portable Vkleo Research Group at Stanford 
University. 

The present invention provides video compression 
apparatus and method as ddined in the claims. The 
present invention in one aspect there of further provides 
cont&it-based video compression with difference 
region encoding instead of strictiy moving object encod- 
ing. t)lockwise contour encoding, motion compensation 
failure encoding connected to the bkx:Kwtse contour til- 
ing. subt)and including wavelet encoding restricted to 
subregions of a frame, scalability by uncovered back- 
ground associated with objects. arvJ &tor robustness 
through emtaedded synchronization in each moving 
object*s code plus coder feedback to a deinterleaver. It 
also provides video systems with applications for tftts 
compression, such as video tel^ahcmy arxf fixed cam- 
era sunfeillance lor security, including time-lapse sur- 
veillance, with digital storage in random access 
memories. 

Advantages include efficient low bit-rate video 
encoding with object scalability and error robustness 
with v&y-low-bit-rate video compresaon which allows 
convenient transm^c»t and storage. This permits low 
bit-rate teleconferencing and also surveillance informa- 
tion storage by random access hard disk drive rather 
than serial access magnetic tapa And the segmenta- 
tion of moving objects permits concentration on any one 
or more of the moving objects (MPEG-4). 

BRIEF DESCRIPTION OF THE DRAWIIMGS 

The present invention will now be further described, 
by w^ of example with reference to the accompanying 
drawings in which 

Rgure 1 shows a telephony system according to a 
prefened enrtediment of the invention; 

Rgure 2 fllustrates a surveillance system in accord- 
ance with a preferred embodiment of the invention; 

Rgure 3 is a flow diagram for video compression in 
accordance with the invention; 

Rgures Ae-d shows motion segmentation; 

Rgures 5a-g illustrates boundary contour encod- 
ing; 

Rgure 6 shows motion compensation; 

Figure 7 iliistrates motion faOure regbns; 

Rgure 8 shows the control grid on the motion failure 
regions: 
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Figure 9a-b show a single wavelet fStering stage; 
Figures lOa-c illu^tes wavelet decomposition; 

Rgure 11 illustrates a zerotree for wavelet coeffi- 
cient quantization; 

Figure 12 s a wavelet compressor block diagram; 

Figures 1 3a-v shows scalabifity st^: 

Figures 14a4> are a scene witf) arti without a p>ar- 
tiCLdar object; 

Figures 15a-b shows an error correcting cod^ an6 
decoder; and 

Rgures 16a-b illustrates decoder feedback. 

DESCRIPTION OF THE PREFERRED EMBODI- 
MENTS 

Overview of Cornpression and Deconpression 

Figure 1 illu^rates a block diagram of a prefen^ed 
embodiment vldeo*tetephony (teleconferencing) system 
which transmits both speech and an image of the 
speaks using preferred embodiment compression, 
encoding, decoding, and decompression including error 
correction with the encoding and decoding. The tel- 
econferaicing system 5 conrprtses a video camera 10 
and micrq^hone 12 coupled to a compression coder 14 
for conrpressing received video and audio signals from 
the camera 10 and microphone 12. The compressed 
video and audio signals are transmitted in a predeter- 
nr^ned transmission channel 16 asm a suitable trans- 
mission medium 22 to a decompression decoder 18. 
The decorrpression decoder 18 decompresses the 
compressed vkieo and audk) signals and provides the 
decorrpr^sed signals to a vkleo dsp^ and speaker 
20. Of course. Rgure 1 shows only transmisston in one 
direction and to only one receiver; in practice a secorKj 
camera arxi second receiver would be used for trans- 
mission in the opposite direction and a third or more 
receivers and transmitters could be connected into the 
system. The video and speech are separately conrv 
pressed and the allocation of transmission chanrml 
bandwtdtii between vkleo and speech may be dyrmmi- 
cally adjusted depending upon the srtuatkin. The costs 
of telephone network bandwidth d^iand a knw-kMtHrate 
transmission. Indeed, very-low-bit-rate vkleo compres- 
sion finds use in multimedia applications where visual 
quality nmy be compromised. 

Figure 2 shows a first prefenred embodiment sur- 
veillance system, ger^rally denoted by reference 
numeral 200, as comprising one or mcxre fixed video 
cameras 202 focused on stationary bad^round 204 
(with occask>nal moving objeds 206 passing in the fidd 
of view) plus video oonrpressor 208 together with 



remote storage 210 plus decoder and di^y 220. 
Conpressor 208 provides contpresskm of the stream of 
vkJeo images of the scene (for example. 30 frames a 
secomJ with ea(^ frame 176 by 144 8-bit monochrome 

5 pixels) so that the data transmission rate from compres- 
sor 208 to storage 21 0 may be very low, for exanrple 22 
Kbits per second, wfvie retaining high quality Images. 
System 200 relies on the stationary background and 
only encodes moving objects (which appear as regions 

70 in tfie fram^ which nvjve relative to the background) 
with prec^ctive motion to achieve the low data rate. This 
low data rate enables simple transmission channels 
from cameras to monitors and random access memory 
storage such as magnetic hard disk drives available for 

IS personoi conputers. Indeed, a single telephone line 
with a modem may transmit tiie compressed video 
image ^eam to a renwte monitor. Further, storage of 
the video image stream for a time interval, such as a day 
or week as required by the particular surveillance situa- 

so tton. will require much less memory after such compres- 
sion. 

Video camera 202 may be a CCD camera with an 
tncamera analog-to-digHal converter so that the ou^ 
to oorrpressor 208 is a sequence of digital frarrres as 

25 generally illustrated in Figure 2; alternatively, analog 
cameras with additional hardware may be used to gen- 
erate the digital video stream of frames. Conrpressor 
208 may be hardwired or, more conveniently, a digital 
signal processor (DSP) with the conrpression steps 

30 Stored in ont)oard memory. RAM or ROM or both. For 
example, a TMS320C50 or TMS320C80 type DSP as 
manufactured by Texas Instalments Inc. may suffice. 
Also, for a teleconferencing system as shown in Rgure 
1, error correction with real time reception may l>e 

35 included and implemented on general purpose proces* 
sors. 

Figure 3 shows a Ngh level flow diagram for the 
prefenred enribodiment video conpresston methods 
which axlude the foBowing steps for an input coning 

40 of a sequence of frames, Fq, F|. Fa Fn, with each 

franr^ 144 rows of 176 pixels or 288 rows of 352 pixels 
and with a frame rate of 10 frames per second. Details 
of the steps appear in the following sections. 

Frames of these two sizes partition into arrays of 9 

45 rowsof 11 macrobkx*svwth each macroWock being 16 
pixels by 16 pixels or 18 rows of 22 macroblcKte. The 
frames will be encoded as I pictures or P pictures; B pic- 
tures with their backward interpolation would create 
overly targe time delays for very low bitrate transmis- 

50 sion. An I picture occurs only once every 5 or 10 sec- 
onds, and the majority of frames are P pictures. For the 
144 rows of 176 pixels size frames, roughly an I fidum 
will be erK»ded with 20 Kbits and a P picture with 2 
Kbits, so the overall bitrate wiD be roughly 22 Kbps (only 

55 10 frames pa^ second or less). The frames may be mon- 
ochrome or color with the color given by an intensity 
frame (Y signal) plus one quarter resolution (subsam- 
pled) color corTi>tnation frames (U arxi V signals). 
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(1) For the current frame 30, encode the zeroth 
frame Fq as an I picture like in MPEG-1.2 using a 
waveform coding technique based on the DCT or 
wavelet transform. For the DCT case, partition the 
frame into 8 by 8 blocks: compute the DCT of each 5 
block; cutoff the high spatial frequences: quantize 
and encode the renatning frequ&icies. and trans- 
mit TTie encoding Includes run length encoding, 
then Huffman encoding, and then error correction 
encoding. For the wavelet case, compute tfie multi- w 
level decomposition of the frame: quantize and 
encode the resulting wavelet coefficients, and 
transmit Other frames when the current frame, will 
ateo be encoded as I pictures with the frequency 
dependent upon the transmission channel bitrat& 13 
And for F|y) to be an I picture, encode in the same 
manner. 

(2) For frame F|si to be a P picture, detect moving 
objects in the frame by finding the regions of so 
change from reconstructed F^^.^ to F^^ using motion 
segmentation. Recorstructed F^^i.^ is tha approxi- 
mation to Fn.i which is actually transmitted as 
described below. Note that the regions of change 
need not be partitioned into moving objects plus 2$ 
uncovered background and will mty approxinmtely 
describe tiie moving objects. However, this approx- 
imation suffices and provides more efficient tow 
coding. Of course, an alternative would be to also 
make this partition into nruiving objects plus uncov- 30 
ered background through mechartisms such as 
inverse motion vectors to detenriine if a region 
maps to outside of the change region in the previ- 
ous frame and thus is uncovered k>ackground, edge 
detection to deternine the object, or presumption of 35 
object characteristics (modete) to d^tinguish the 
object from background. 

(3) For each connected component of the regions 

of change from (2). code its boundary corttour. 40 
including any interior holes In a contour r^resenta- 
tion and coding st^ (block 34). Thus the bounda- 
ries of nrtoving objects are not exactly coded; rather, 
the boundaries of entire regions of change are 
coded and approximate the boundaries of the mov- 4S 
Ing ok^ects. The boundary coding may be either by 
splines approximating the bourvlary or by a binary 
mask tncficatETtg blocks within the region of change. 
The spline provides more accurate representation 
of the boundary, but the binary mask uses a smaller so 
nuntber of bits. Note that the connected compo- 
nents of the regions of change may be det^mined 
by a raster scanning of the binary image mask and 
sorting pixels in the mask into groups, which may 
merge, according to the sorting of ac^cent pixels, ss 
The final groups of pixels are the connected conv 
ponents (connected regions). For exanple of a pro- 
gram, see Ballard el al. Computer Vision (Prentice 
Hall) at pages 149-152. For convenience in the fol- 



lowing the connected conponents (connected 
regions) may be refen^ed to as (moving) ejects. 

(4) Remove tenporal redundancies in the vkieo 
sequence by a motion estimation (block 36) which 
estimate the motbn of the otsjects from the previ- 
ous frame. In particular, match a 16 by 16 block in 
an object in the cun^nt frame F^ with the 16 by 16 
block in the same location in the preceding recon- 
structed frame Ff^.^ plus translations of this block 
up to 15 pixels in ail directions. The best ntatch 
defines the motion vector for this blocK and an 
approximation to the cun'ent frame F^ can be 
synthesized from the preceding frame F^.^ by using 
the motion vectors with their oorr^ponding blocks 
of the preceding frame. 

(5) After the use of nrotion of objects to synthesize 
an approximation P^,, there n^ still be areas 
within the frame which contain a significant amount 
of residual irrfbrmation, such as for fast changing 
areas. That is, the regions of difference between F^^ 
and the synthesized approximation P^g have motion 
segmentation appned analogous to the steps (2)- 
(3) to define the motion failure regions whk:h con- 
tain significant infbrmatton (block 38). 

(6) Encode the nrotion failure regions from (5) using 
a wavefOTm coding technique based on the DCT or 
wavelet transform in residual encoding step (clock 
40). For the DCT case, tile the regions with 16 by 1 6 
macrc±)!ocks, af^ty the DCT on 8 by 8 UTOks of the 
macrdslocks, quantize and encode (runlength and 
then Huffman codmg). For the wavelet case, set all 
pixel values outskie the regtons to zera apply the 
multi-level deconposition, quantize and encode 
(zerotree and then arithmetic coding) only mose 
wavelet coeffbtents corresponding to the selected 
regions. 

(7) Assemble the encoded Infonration for I pictures 
(DCT or wavelet data) and P pictures (otajeds 
ordered with each object having contour, motion 
vectors, and ntotion failure data). These can be 
codewords from a table of Huffman codes: this is 
not a dynamic table but rather generated experi- 
merrtaliy. 

(8) Insert resynchronization words at the beginning 
of each I picture data, each P picture, each contour 
data, each motion vector cfata, and each nootion fail- 
ure data. These resynchronization words are 
unique in that they do not appear in the Huffman 
codeword table and thus can be unambiguously 
determined. 

(9) Encode the resulting bitslream from (8) with 
Reed-SolonK>n codes together with interleaving. 
Then transmit or sttwre. 
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(10) Decode a received encoded bitelream by 
Reed-Sdomon plus deinterleawng. The resynchro- 
nization words help after decodirtg faflure and also 
provide acc^s points for random access. Further, 
the decoding way be with shortened Reed -Solo- 5 
mon decoders on either side of the detnterleav^' 
plus feedback from the second decoder to the first 
decoder (a stored copy of the decoder InpuQ for 
enhanced of error correction. 

10 

(11) Additional functionalities such as object scala- 
bility (selective encoding/decoding of objects in the 
sequence) and quality scatabilrty (selective 
OThancennent of the quality of the objects) which 
result in a scalable tHtstream are also siq^ported. rs 

MOVING OBJECT DETECTION AND SEGMENTA- 
TION{B!ock32) 

The first preferred embodimert method detects and zo 
segments moving objects by use of regions of differ- 
ence between successive video frames but does na/t 
attempt to segregate such regions into moving objects 
plus uncovered t>ackground. This simplifies the informa- 
tion but appears to provide suffident quality, tn particu- 2s 
lar. for frame at each pixel find the absolute value of 
the difference in the intensity (Y signal) between F^ and 
reconstructed Fn.i. For 8-bit intensities (256 levels 
labeled 0 to 255). the camera caia>ration variability 
would suggest taking the intensity range of 0 to 1 5 to be 30 
daric and the range 240-255 to be saturated brightness. 
The ak>so!ute value of the intensity difference at a pixel 
win lie in the range from 0 to 255. so diminate minimal 
differences and form a binary image of differences by 
thresholding (set any pixel at>soiute cfifference of less 35 
than or equal to 5 or 10 (d^ending upon the scene 
anrt>ient illuminatbn) to 0 and arty pix^ absolute differ- 
ence greater than 30 to 1). This yields a binary image 
which may appear speckled: Figures 4a-b illustrate two 
successive frames and Rgure 4c the t>inary image of 40 
thresholded absolute difference with black pixels Indi- 
cating Is arKi Indicaling significant diff^-ences and the 
white background pixels indicating Os. 

Then eliminate small isolated areas in the binary 
image, such as would result from noise, tjy nnedian f ifter- 45 
ing (replace a 1 at a pixel with a 0 if the 4 or 8 nearest 
neighbor pixels are all Os). 

Next apply the morphological dose opemtion 
(dilate operation followed by erode operation) to fil-in 
between dose by 1 s; that is. replace the speckled areas so 
of Figure 4c with solid areas. Usedilate and erode oper- 
ations wdth a circular kernel of racfius K pixels (K may be 
1 1 for OCIF frames and 13 for CIF frames); in particular, 
the dilate operation replaces a 0 pixel with a 1 if any 
other pixel within K pixels of the original 0 pixel ts a 1 55 
pixel, and the erode operation replaces a 1 pixel with a 
0 unless all pixels within K pixels of the original 1 pixel 
are all also 1 pxels. After the dose operation, apply the 
open operation (erode operation fdtowed by dilate oper- 



ation) to remove sn^l isolated areas of 1s. This yields 
a set of connected components (regcns) of 1 pxels 
with fairly smooth bcHindaries as illustrated in Rgure 4d. 
Note that a cmnected component may have one or 
more interior holes which also provide boundary con- 
tours. 

Then raster scan the binary image to detect and 
label connected regbns and their boundary contours (a 
pixel which is a 1 and has at least one nearest neighbor 
pixel which is a 0 is deerned a boundary contour pixel). 
A procedure such as ccomp (see Ballard reference or 
the Appendix) can accomplish th«. Each of these 
regions pr^mptively indicates one or more movir^ 
objects plus background uncovered by the motion. 
Srr^itl regioTfs can be disregarded by using a threshold 
such as a minimum differerice between extreme bound- 
ary pixel coordinates. Such small regions may grow in 
succeeding frames and et/errtually arise in the mc^m 
failure regions of a later frame. Of course, a connected 
region cannot be smaller than the K-pixel-radius 
dilate/erode kernel, otherwise it would not have sur- 
vived the open operation. 

CONTOUR REPRESEKrrATION(Block34) 

The preferred embodiments have an option of 
boundary contour encoding by either spline approxima- 
tion or blocks straddling the contour; this permits a 
choice of either hig^ resolution or tow resdution and 
tiius provides a scalability. The boundary contour 
encodir^ with the bk>ck representation takes fewer tiits 
but is less accurate than the spline representation. Thus 
a tradeoff exists whk^ may be selected according to the 
application. 

(i) Block boundary contour representation. 

for each of the connected regk>ns in the binary 
image derived from F,si in the prececSng section, find the 
boundaig rectangle for the region by f aidir^ the smallest 
and largest tXHjndary pixel x coordinates and y coordi- 
nates: the smallest x coordinate (xq) and the sniall^ y 
coonfinate (yo) d^ine the lower teftfiand rectarrgle cor- 

(xo.yo) ^ ^ largest coordinates define the i^iper 
righthand corner (x^.y^); see Rgure 5a showing a con- 
nected region and Rgure 5b the region plus the bound- 
ing rectangle. 

Next tile the rectangle with 16 t>y 16 macrot)iocks 
starting at (xo.yo} and with the macroblocks extendir^ 
past the upper and/or rightharxi edges if the rectangles 
sides are not muttiptes of 18 pixels: see Rgure 5c illus- 
trating a tiling. If the tiling woukJ extend outside of the 
frame, then translate the corner (xo.yo) to just keep the 
tilirig wittiin the frame. 

Form a bit map with a 1 representing the tiling mac- 
roblocks that have at least 50 of their 256 pixels G e. . at 
least about 20%) on the boundary or inside the regbn 
and a 0 for macroblocks that do not. This provides the 
btock description of the boundary contour: the starting 
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comer (xo.yo) ami the bit map. See Figure 5d showing 
the bit map. 

The corner plus bit map information wOl be transmit- 
ted (btock 35) if the region is small ; that is. if at most 3 or 
4 macroblocks tile the bounding rectangle. In case the 
region is larger, a more efficient coding proceeds as fol- 
lows. First, compare the bit map with the txt maps of the 
previous frame, typically the previous frame has only 3 
orAbitmaps. If a bit map match is found, then compare 
the associated comer. {x*o»y o)> of the prenous frame's 
bit map with (xo.yo). Then if (x o./o) equals ()^»yo). a bit 
indicating the corner and bit map matching those of the 
pre^ous frame can be transmitted instead of the full bit 
map and comer. Rgure 5e suggests this single bit con- 
tour transmission. 

Similarly, if a bit map match is found with a bit map 
of the previous frame but the associated comer (x'oyo) 
does not equal (xQ.yo). then transmit a translatiai vector 
[(x'oyo)*(xo.yo)] instead of the full bit map and corner. 
This translation vector typically win be fairly small 
because objects do not move too much frame-to-frama 
See Rgure 5f. 

Further, if a bit map match is not fburai. but the bit 
mapdifferOTce is not large, such as ordy 4 or 5 macrob- 
tock differences, both added and renruyved. then trans- 
mit the locations of the changed macroblocks jpHus any 
translation vector of the associated rectangle corners. 
(x'oyo)-{^yo)- Se© Figure 5g. 

Lastly, for a large drflerence in macroblocks. just 
transmit the comer (Xo.yo) plus run l&igth encode the bit 
map along rows of rnacrol>locks in the k>ounding rectan* 
gle as Illustrated in Rgure 5h for transnnssron. Note that 
large-enough holes within the region plus projections 
can give rise to multiple runs in a row. 

(ii) Spline boundary contour representation: 

For each connected region derived in the preceding 
section find corner points of the boundary contour(s}. 
including of any interior holes, of the region. Hate that a 
region of ^ze roughly 50 pixels in diameter wBI have 
very roughly 200-300 pixels in its bourxlary contour, so 
use about 20% of the pixels in a contour r^esentatk^n. 
A Catmull Rom spline (see the Fol^ reference or the 
Aji^endix) fh to the corner points approximates the 
boundary. 

MOTION ESTIMATION(Block36) 

For each connected region and bit map derived 
from Fn in the preceding section, estimate the motk)n 
vector(s) of the region as follows. Rrst. for each 16 by 
16 macrobbck in F^^ which corresponds to a macrob- 
lock indicated by the tat map to be within the region, 
compare this macroblock with macroblocks in the previ- 
ous reconstructed frame. F|^). which are translates of 
up to 15 pixels (the search area) of this macrc^ock in 
Fn. The comparison Is the sum of the at>sotute diffa*- 
ences in the pixel intensities of the selected macrobfock 



in Ff4 and the compared macroblock in Fj^^ with the 
sum over the 258 pixels of the macroblock. The search 
is performed at a sub-pixel resolution (halt pixel with 
interpolation for comparison) to get a good match and 
5 extends 15 pixels in all directions. The motion vedor 
correspondirtg to the translatbn of the selected maao!>- 
lock of Fn to the Ff^^ macrob]ock(s) with minimum sum 
cfifferertces can then be taken as an estirrate of the 
motion of the selected macrc^odc Note that use of the 
10 same macrot)lock locations as in the bit wap elmiinates 
the need to transmit an additonat starting location. See 
Figure 6 indicating a motion vector. 

If the minimum sum differences defining the motion 
vector \s atxsve a threshold, then none of the macrot>- 
15 locks searched In F^.^ sufficiently matches the selected 
macroblock in F^ and so do not use the motion vector 
representatioa Rather, simply encode the selected 
macrcd3k)ck as an I block (intraframe encoded in its 
entirety) and not as a P block (predicted as a translation 

so of ablockof thepre/iousframe). 

Next, for each macroblock having a motion vector, 
subdivide the macroblock into four 8 by 8 blocks in F^ 
and repeat the comparisons with translates of 8 by 8 
blocks of Fn.^ to find a motion vector for each 8 by 8 

25 tkock. H the total nuir^ of code bits needed for the four 
motion vectors of the 8 by 8 blocks is less than the 
number of code t>its for the motion vector of 16 by 16 
macrobbck and if the weighted emor with the use of four 
motion vectors conrpared to the single macrol^ock 

30 motk)n vector, then use the 8 by 8 block motion vectors. 
Average the motion vectors over all macroblocks in 
F|vt which are within the region to find an average motion 
vector for the entire region. Then if none of the macrol>- 
lock motion vectors differs from the average motion vec- 

35 tor by more than a threshold, only the average motion 
need be transmitted (block 37). Also, the average 
motion vector can be used in error recovery as noted in 
the following En-or Concealment section. 

Thus for each connected region found in F^ by the 

40 foregoing segmentation section, transrrot the motion 
vector(s) plus Wt map (btock 37). Typlctf ly. teleconfer- 
encing with 176 by 144 pixel frames will require 100-150 
bits to encode the sh^>es of the expected 2 to 4 con- 
nected regions plus 200-300 bits for the motfon vectors. 

45 Also, the optional 8 by 8 or 16 by 16 motion vectors 
and overlapped motion compensation techniques may 
be used. 

MOTION FAILURE REGION DETECTION(Block 38) 

so 

An approximation to F^, can be synthesized from 
reconstructed Ff^.^ by use of the motion vectors plus 
corresponding (macro) blocks from Ff^.^ as found in the 
preceding sectic»i: for a pixel in the portion of F^ lying 
55 out^e of the difference regions found in the Segmen- 
tation sectioa just use the value of the corresponding 
pxel in F|sf.i. and for a pxel in a connected region, use 
the value of the corresponding pixel in the macroblock in 
Fn.1 which the motion vector translates to the macrob- 
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tock in Fm containing the pixel. The pixete in with 
intensities which diff^ by more than a threshold from 
tfie intensity of the cweqaonding pixel in the approxi- 
matiOT synthesized by use of the nrotion vectors plus 
corresponding (nHcro)blocks from F^.^ r^esent a 5 
motion compensation failure region. To handle this 
nK>tion failure region, the intensity differences are 
thresholded. next median filtered, and sut^ected to the 
morphological dose and open operations in the same 
manner as the differences from F^.^ to Ff4 descn'bed in yo 
the foregoing object detection and segmentation sec- 
tion. Note that the motion failure regions will lie Inside of 
moving object regions; see Rgure 7 as an illustration. 

If a spline t}ourKtary contour was used, then only 
consider the portion of a macroWock inskie the bound- is 
ary contour. 

RESIDUAL SIGNAL ENCODING{Block 40) 

Encode the motion failure regions as follows: tile 20 
these motion failure regions with the 16 by 16 macrob- 
locks of the bit map of the foregoir^ tsoundary contour 
section, this eliminates the ne^ to transmit a starting * 
pixel for the tiling because it is the same as for the bit 
map. This also means that the tiling nxives with the ss 
ot^ect and thus may lessen the changes. 

For the nrtt>tion failure regions, in each macrobtock 
simply ai:^y DCT with quantizatton of coefficients and 
runlength encoding and then Huffman encoding. See 
Figure 8 showing the macroblocks within the gnd. 30 

A preferred embodiment motion failure region 
encoding uses wavelets instead of DCT or DPCM. In 
particular, a prefOTed embodiment uses a wavelet 
transform on the macrobbcks of the motion failure 
region as illustrated in Rgure 8. Recall that a wavelet 35 
transform is tr^itionally a full frame transfbrm based on 
translations and dilations of a ntother wavelet, Y0» and 
a mrther scaling function, FQ; both YQ and FQ are 
essentially non-zero fbr only a few adjacent pixels, 
depending upon the particular mother wav^et Then 40 
basis functions for a wavelet transform in c^e dimen^on 
are the Y„„(t) = 2*"^ YiZ ^'t - n) for integers n and 
m. YQ and FQ are chosen to make the translations and 
dilations orthogonal analogous to the orthogor>ality of, 
the 6in(kl) and cos(kt) so a transfonm can be easily conv 45 
puted by integration (summation for the discrete case). 
The two dimensional transform dmpty uses t^asis func- 
tions as the products of Yn.,„Os in each dimension. Note 
that the index n denotes translatbns and the index m 
denotes dilations. Compression arises from quantize- so 
tion of the transformation coeffkients anak>gous to 
compressk>n with DCT. See for example. Antonini et al. 
Image Coding Using Wavelet Transfonn. 1 IEEE Tran. 
Image Proc. 205 (1992) and Mallat A Theory for Mul- 
tiresdution Signal Decompositk>n: The Wavelet Repre- 55 
sentation. 11 IEEE Tran. PatL Anal. Mach. Intel. 674 
(1989) tor discussron of wavelet transformations. For 
disaete variaties the wavelet transfonnation may also 
be viewed as subband fdterir^: the filter outputs are the 



reconstrwtions from sets of transform coeffkawits. 
Wavelet transforrr^tions r^oceed by successive stages 
of decomposition of an image through f Oterings into four 
suljbands: fowpass horizontally with lowpass vertically, 
highpass horizontally with Icwqaass vertk:a!ly. fowpass 
horizontally with highpass vertically, and highpass bofri 
horizontally and vertically. In the first stage the highpass 
fflterir^ is convolution with the translates Ypj and the 
tcwpass is convolution with the scaling function trans- 
lates Fp |. At the second stage the output of the first 
stage 8ubt>and of lowpass in tx>th horizontal and v^cal 
is a^n filtered into four subbands l>ut with hi^ipass fa- 
taring now convdutton with Y^ 2 whkii in a sense \ws 
half the frequency of : similarly, the lowpass filtering 
is convolutton with Fp^.' Figures 9a-b illustrate the four 
sut^>and filterings with recognitfon that each filtered 
image can be subsampled by a factor of 2 in each direc- 
tion, so the four output Images have the same nunr^r 
of pixels as the original Input imaga The preferred 
eirtxxliments may use biorthogonal wavel^s whkti 
provides fdters with linear phasa The bkulhogonal 
wavelets are similar to the orthogonal wavelets 
described above but use two related mother wavelets 
and mother scaling functions (for the decomposition 
and reconstruction stages). See tor exan^le. Villasenor 
et al. Filter Evaluation and Selection in Wavelet Image 
Conpressfon. IEEE Proceedings of Data Conrpression 
Conferwtce. Snowbird. Utah (1994) which provklossev- 
eral examples of good biorthogonal wavelets. The pre- 
fared entKxSment may use the (6.2) tap filter pair from 
the Villasenor paper whk:h has low pass fater coeffi- 
cients of: ho « 0.707107 h^ » 0.707107 and go = - 
0.088388 gi = 0.088388 ga « 0.707107 g3 = 0.707107 
g4 " 0.088388 gs « -0.088388 for the analysis and syn- 
thesis fitters. 

Prefenred errt)odim^ wavelet transfomris gener- 
ally selectively code information in only regfons of intw- 
est in an image coding only the regions in the 
subbands at each stage which con^espond to the origi- 
nal regfons of Interest in the original innage. See F^ures 
lOa-c. heuristlcally Qlustrating how regions appear k% 
the subband filtered outputs. This approach avokis 
spending bits outskie of the regfons of interest and 
improves vkleo quality. The specifrc use for motfon laa- 
ure regions is a special case of only encoding regions of 
interest. Note that the thesis of H. J. Bamard ("Image 
and VkJeo Coding Using a Wavelet Decomposition-. 
Technische Universfteit Delft. 1994) segnDents an image 
into relatively honK>geneou8 regions and then uses dif- 
ferent wavelet transforms to code each region and only 
conskia^ed single images, not vkleo sequences. Bar- 
nard^ method also requires the wavelet transformatkMi 
be modVied for each region shape: this ackls conrtplexity 
to the fining stage and the oodii^ staga The prefenred 
embodiments use a single filtering transtomn. Further, 
the prefenred embodiment applies to regions of interest, 
not iust homogeneoi^ regions as In Barnard and whkii 
fOl up tiie entire frama 

The prefen-ed enrto£ments represent regfons of 
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interest with an image map. The map represents which 
pixels in a given irage lie within the regior^ of interest 
The amplest form Is a binary rmp representing to be 
coded or not to be coded. If more than two values are 
used in the map, then varying priorities can be given to 5 
differ&Tt regions. TTtts rrmp wusX also be transmitted to 
the decoder as side information. For efficiency, the map 
information can be comt^ned with other side information 
such as motion con7>ensation. 

The rmp is used during quantization. Since the 10 
wavelets decompose the image into subbands, the first 
step is to transfer the map to the subband structure (that 
is, detenmine which locations In the subband output 
Inrmges correspond to the original map). This produces 
a set of subregions in the subbands to be coded. Rg- is 
ures lOa-c show the si^regions: Figure 10a shows the 
original image rrmp with the regiwis of interest shown, 
and Rgure 10b shows the four subband outputs with the 
correspording regions of tnt^est to be coded after one 
stage of decomposition. Figure 10c shows the subband so 
structure after two stages and with the regions of inter- 
est. 

The prefen-ed emt>odiment first sets the pixels out- 
side of the regions of interest to 0 and then applies the 
wavelet decomposition (subband f ilta-tng stages). Afto- 2S 
deconrposition and during the quantization of the wave- 
let transform coefficients, the encoder only sends infor- 
mation about values that lie within the 8ut>regions of 
Interest to be coded. The quantizatbn of coefTtaents 
provides compression analogous to DCT transform 30 
coeffident quantization. Experiments show that the 
video quality tnaeases with compression using the 
regions of interest approach as compared to not using 
it 

There is some slight sacrifice made in representing 35 
the values near the edges of the selected regions of 
interest t>ecause the wavelet tittering process will smear 
the information somewhat and any information that 
sntears outside the region of irrterest lx>undary is lost 
This means that there is no guarantee of perfect reoon- 4o 
struction for values irrside the regioi of interest even if 
the values in the regions of interest were perfectly 
coded. In practice, this does not seem to t>e a severe 
hardship because the level of quairtization required for 
typical compression applications means that the images 45 
are far from any perfect reconstruction levels anyway 
and the small effect near the edges can be ignored for 
all practical purposes. 

The prefen-ed enr4x)diment6 may use the zerotree 
quantization method for the transform coefficients. See so 
Shapiro, Embedded In^ge Coding Using Zerotrees of 
Wavelet coefftoents, 41 IEEE Trans. Sig. Proc. 3445 
(1993) for details of the zerotree method applied to sin- 
gle images. The zerotree method implies that the only 
zerotrees tfiat lie within the subregions of interest are 55 
coded. Of course, other quantization methods could be 
used instead of zerotree. Figure 1 1 Olustrates the zero- 
tree relations. 

In aii^lications the regions of interest can be 



selected in nr^ny ways, such as areas that contain ta^e 
numbers of errors (such as quantizing video after 
motion compensation) or areas corresponding to per- 
ceptually important image features (such as faces) or 
objects for scalat^e compression. Having the 8k>0ity to 
select regions is especially useful in motion compen- 
sated video coding where cuantizatron of residual 
images typically comain information concentrated in 
aireas of ntotion rather than unifbrmly spread over the 
frame. . 

Regions of interest can be selected as macrobtocks 
which have en^ors that exceed a threshold after motion 
compensation. This application essentially combines 
r^ion of Intwest map infomration with motion compen- 
sation information. Furtiier. the regions of interest could 
be maaoWocks covering objects arKJ thar motion ten- 
ure regions as descrbed in the foregoing. 

Rgure 12 illustrates a video compressor so using 
the wavelet transform on regions of interest 

An altemative preferred embodiment uses a wave- 
let transform on the motion failure region macrok^octe 
and these m£^ be aligned with the rectangular grid. 

(1) Initially, encode the zeroth frame Fq as an I p'c- 
ture. Compute the multi-level decomposition of the 
entire frame; quantize and encode the resulting 
wavelet coefficients, and transmit The preferred 
embodiment uses the zerotree method of quantiza- 
tion and encoding. Any subsequent frame F^ tfiat is 
to be an I picture can be encoded in the same man- 
ner. 

(2) For each frame encoded as a P picture (not an I 
picture), perform nrxstion compensation (biock 52) 
on the input frame by comparing the pixel values in 
the frame with ^xe\ values in the previous recon- 
structed frama The resulting predicted frame is 
sutstracted from the input frame to produce a resid- 
ual image (differoit between predicted and actual 
pixel values). The motion compensation can be 
ctone using the segmentation approach described 
earlier or simply on a block by Uock basis (as in 
H.263). The resulting motion vector information is 
coded and transmitted (block 53). 

(3) For each residual image computed in step (2), 
determine the region or regions of interest (block 
54) that require additional information to be sent. 
This can be done using the motion failure approach 
described earlier or simply on a macroblock basts 
by conparing the sum of the squared residual val- 
ues in a macroblock to a tiireshoW and including 
only those macroblocks atxsve the ttireshold in the 
region of inters. This et^ produces a region of 
interest map. This map is coded and transmitted 
(block 55). Because ttie map information is corre- 
lated with the oration vector infonmation iri (2), an 
alternative preferred enrtediment codes and trar^- 
mits the motion vector and map information 
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togeth^ to reduce the number of bits required. 

(4) Ueiiig the residual image computed in step (2) 
ard the region of interest map produced (3). val- 
ues in the residua) images that correspond to loca- s 
tions outside the region of interest map cari t>e set 
to zero (block 56). This insures that values outside 
the region of interest will not affect values within the 
region of interest after wavelet decomposition. St^ 

(4) is optional and may not t>e appropriate if the io 
region based wavelet approach Is applied to some- 
thing besides motion compensated residuals. 

(5) The traditional multi-level wavelet decompoa- 
tion (block 58) is applied to the image conr^uted in is 
(4). The number of fOtering (derations can t>e 
reduced (at the cost of more conrplextty) by per- 
forming the filterBig only virithin the regions of inter- 
est. However, because of the zerosig from (4), the 
same results will be obtained by performing the fil- 
tering on the entire trrage which simii^ifies the filter- 
ing stage. 

(6) The decomposed image produced in (5) is next . 
quantized and encoded (block 60). The region of 2S 
interest nrmp is used to specify which conrespomfing 
wavelet coefficients in the decomposed subbancte 
are to be considered. Rgure 10 shows how the 
region of interest map is used to indicate which sub- 
regions in the subbands are to be coded. Next all 3o 
coefficients within the subregions of interest are 
quantized and encoded (block 60). The preferred 
ontxxlimeht uses a modification of the zerotree 
approach by Shapiro^ which combines correlation 
between subbands. scalar quantization and arith- ss 
metic coding. The zerotree ai^oach is applied to 
those coefficients within the subregions of interest 
Other quantization and coding approaches could 
also be used if modified to only code coeflnents 
within the subregions of interest The output bits of 4o 
the quantization and encocfing step is then transmit- 
ted (block 59). The residting quantized deconv 
posed image is used in st^ (7). 

(7) The traditional multi-level wavelet reconstruction 4S 
(block 62) is applied to the quantized decomposed 
image from (6). The number of filtering operatiorts 
can be reduced (at the cost of more connplexity) by 
performing the filtering only within the regions of 
interest However, because of the zerartg from (4). so 
the same results win be obtained by performing the 
filtering on the entire in^age which simplifies the fO- 
tering staga 

(8) As In (4). the reconstructed residual tnnage com- 65 
paited in (7) and the region of interest nrtap pror 
duced in (3) can be used to zero values in the 
reconstructed r^dual image thai correspond to 
locations outside the region of interest nmp (t^ock 



64). This inaires that values outsde the region of 
interest will not be modified when the reconstructed 
residual is added to the precficted image St^ (8) is 
cptbnal and may not be appropriate if the regbn 
based wavelet apprcmch is applied to something 
besides motion conrpensated re^uals. 

(9) the resulting residual image from (8) 6 cdded to 
tie predicted frame from (2) {tkxk 66) to produce 
the reconstructed frame (tl^s is what the decoder 
will decode). The reconstructed franrte is staed in a 
frame menrusry (block 68) to be used to for motion 
compensatton for the next frame. 

More generally, adtiband fStering of other types 
such as QMF and ^hnston couki be used in place of 
the wavelet filtering provided that the region of int&'est 
based approach is maintained. 



The object oriented approach of the prefored 
embodiments permits scalability. Scable compression 
refers to the construction of a compressed video bit 
stream that can have a sut>set of the encoded informa- 
tion removed, for example all of the objects representing 
a particular person, the remaining t^tstream win still 
decode conrectly. that ts, without the removed person, 
as if the person vfere never in the vkteo scenes. The 
renxjval tmsX occur without decoding or recocGng any 
objects, ^k)te that the objects may be of cfifferent types, 
such as *enhancemenr otsjects, whose loss would not 
remove the object from the scene, but rather just lower 
the quality of its visual ^^pearance or onnt audio or 
other cteta linked to the object. 

The preferred embocfiment scalable ob|ect-t>ased 
video coding proceeds as fdtows: 

Pr^ume an input video sequence of frames 
together with a segmentatbn mask for each frame, tfie 
trmsk delirteates wNch pixels belong to whi^ objects. 
Such a mask can be devetoped by differ^e r^ons 
together wiih inverse motkxi vectors for deternrmiing 
uncovered l:>ackoround plus tracking Uirough frames of 
the connected regtors, including mergers and separa- 
tions, of the mask for object identif icatioa See the back- 
gromd references. The frames are coded as I frcunes 
and P frames with the initial frame k>eing an t frante and 
other I frames may occur at regular or irregular intervals 
thereafter. The intervening frames are P frames arKi rely 
on predction from the ck>s^ prececBng I frama For an 
I frame de^ne the "I objects" as the objects the segmen- 
tafion.mask identifies; the l-objects are not just in the I 
franies but may persist into the P franrms. Figures 13a-b 
IDustrates a first frame plus its segmentation mask. 

Encode an I franrte by first forming an inverse image 
of the segmentatk>n mask. Then this tmsQe is (locked 
(covered with a minimal nun^er of 16 t>y 16 maaob- 
locks aligned on a grkS}, and the blocked image is used 
as a mask to extract the k>ackground image from the 
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frame. See Rgures 13c-d illustrating the biocked image 
and the extracted t)ackground. 

Next the blocked mask is efficiently encoded, such 
as t>y the differential contour encoding of the foregoing 
desaiption. These mask bits are put into the output bit- s 
stream as part of object #0 (the background object). 

Then the detracted background is efficiently 
encoded, such as by DOT encoded 16 by 16 macrot)- 
locks as in the foregoing. These bits are put into the out- 
put bitstream as part of object #0. w 

Further, for each obje^ in the frame, the segmenta- 
tion n^sk tor that object is tJlocked art6 encoded, and 
that object extracts from the first frame via the blocked 
mask arKi encoded, as was done for the background 
image See Rgures 13e-f illustrating the blocked object 75 
mask and extracted object The bkKked irask and 
extracted object are encoded in the same manner as 
the background and the bits put into the output bit- 
stream. 

As each object is put into the bitstream tt is pre- 20 
ceded by a header of fixed length wherein the object 
nunr^er. ok^ect type (such as 1-object) and object length 
(in bits) is recorded. 

After alt of the objects have been coded, a recon- 
structed frame mada comt^ning decoded images of 2S 
the background arKi each object into one frame. This 
reconstructed frame is the same frame that win be pro- 
duced by the decoder rf it decodes all of the objects. 
Note that overlapping macrdalocks (from different 
objects) will be the same, so the reconstruction will not 30 
be ambiguous See Rgures 13g-i illustrating the recon- 
structed background and objects and frame. 

An average frame is calculated from the recon- 
structed frame. An average pixel value s calculated for 
each diannel (eg., lurr^nance. blue. arKi red) in the 35 
reconstructed frame and those pixel values are repTi- 
cated in their channels to create the average frame. The 
three average pixel values are written to the output tiit- 
stream. This completes the I frame eruxxJing. 

Following the t frame, each 8ut>sequent frame of 4o 
the video sequence is mcoded as a P frame until the 
next, if any. I frame. The "P" stands for "^recficted" and 
refers to the fact that the P frame is predicted from the 
frame preceding it (I frames are coded only with respect 
to themselves), fstote that there is no requirement in the 45 
encoder that every frame of the input is encoded, every 
third frame of a 30 Hz sequence could be coded to pro- 
duce a 10 Hz sequence. 

As with the 1 frame, for a P frame block the &egnr>en- 
tatfon msLSk for each object and extract the object. See so 
Figures 13j-m showing a P frame, a segmerrtation 
mask, an otsject mask, the blocked object mask, and the 
extracted object respectively. Do not use c^ect #0 (the 
background) because it shoukj not be changing and 
should not need prediction. ss 

Next, each of the extracted objects is differenced 
with its reconstructed version in the previous frame. The 
block mask is then adjusted to reflect any holes that 
might have opened up in the differenced image; that ^ 



the reconstructed object may closely match a portion of 
the object so the difference may be t)elow threshold in 
an area within the segmentation mask, and this part 
need not be separaXeAy ^tcoded. See Rgures 13n-o 
showing the object difference and the adjusted block 
rmsK respectively Then the l^lodc mask is efficiently 
erv:oded and put into the output bftetream 

To have a truly object-scalable bitstream the motion 
vectors corresportding to the blocks tilir^ each of the 
ot^ects should only point to locations within the previous 
position of this object Hence in forming this bitstream, 
for each of the objects to be coded in the current image, 
the encoder forms a separate r^nstructed image with 
only the reoorrstructed version of this object in the previ- 
ous frame arKJ all other objects and background 
removed. The motion vectors for the current object are 
e^irr^ed with respect to this image. Before performing 
the rmtion estimation, all the other areas of the recon- 
structed image where the c^ject is not defined (rxm 
mask areas) are fflled with an average background 
value to get a good ntotion estimation at the block 
boundaries. This average value can be different for 
each of the objects and can be transmitted in the bit- 
stream for use by the decoder. Rgure 13p shows an 
image of a reconstructed object with the average value 
in the non rnask area& TTiis is the image used for 
nralion estimation. The calculated motion vectors are 
then effidentiy encoded and put in the t»tstream. 

Then the differences between the motfon compen- 
sated object and the current object are OCT (or wavelet) 
eicoded on a macrdalock basis. If the cfifferences do 
tkA meet a threshold, then they are not coded, down to 
an 8 by 8 pixel granularity. Also, during motion estima- 
tion, some blocks could be designated INTRA blocks 
(as in an I frame and as opposed to INTER tdocksfor P 
frames) if the motion estimation cafoulated that it coukj 
not do a good job on that bfock. INTRA blocks do not 
have motion vectors, and their DCT coding is only with 
respect to the cun-ent Uock. not a difference with a com- 
parsated object block. See Rgures 13q-r illustrating the 
blocks which were OCT coded (INTRA blocks). 

Next, the uncovered background that the objects 
motion created (with respect to the object's positiOT in 
tile previous frame) is calculated and coded as a sepa- 
rate object tor the bitstream. This strata freatment of 
ti!6 uncovered background (along witti the p&r object 
motion compensation) is what makes the bitstream 
scalable (for video ofc^ects). The k^rtstream can be 
played as created; the object and its uncovered back- 
grourtd can be removed to excise the object from the 
ptaybacK or just the object can be extracted to play on 
its own or to be added to a different bitsti'eam.. 

To calculate the uncovered background, the 
object's original (not bfocked) segmentation masks are 
differenced such that all of tiie pixels in the previoi^ 
mask belonging to the current mask are removed. The 
resulting image is then t)locked and tiie blocks used as 
a mask to extract the uncovered laackground from the 
current image. See Rgures 13s-u illustrating the uncov- 
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ered background pixels, a block mask for the pixels and 
the image within the rmsk. 

The uncovered background image is DOT encoded 
as IMTRA bk>cks (making the iroovered backgrourxl 
objects I objects). See Rgure 13v for the reconstructed 
fran^ . 

Decoding the tMtstream for the scalable (^ject- 
b^ed video works in the same n^nner as the previ- 
ously described decoder except that it decodes an 
object at a time Instead of a frame at a tima When drop- 
ping objects, the decoder merely reads the object 
header to f md out how many bits long it is, reads that 
many bits, and throws them away. 

Further, quality scalablity can also be achieved by 
providing an additional enhancement t»tstream associ- 
ated with each object By decodng and using the 
enhancement bitstream the quality of the selected 
objects can be improved. If the channel bandwidth does 
not allow for the transmission of this enhanced bit- 
stream it can be dropped at the encoder. Alternately the 
decoder may also optinrnze its performance by choosing 
to drop the enhaicenr>ent bttstreams assocated with 
certain objects if the applicatbn does not need them. 
The ^ihancement bitstream corresponding to a particu- 
lar object is generated at the encoder by computing ^e 
differences betwe^ the object in the current frame arxJ 
the final reconstructed object (after motion failure region 
encoding) and again DOT (or Wtavelet) encoding these 
differences with a lower quantization factor. Note that 
the reconstriK^ted image should not be modified with 
these cfifferences for the bitstream to remain scalable 
i.a, the encoder and decoder remain in synchrm'zation 
even if the enhancement bitstreams for certain objects 
are dropped. 

Rgures 14a-b niustrate the pr^erred embodiment 
object removal: the person on the left m Figure 14a has 
been removed in Rgure 14bL 

ERROR CONCEALMENT 

The foregoing object-oriented methods coripress a 
video sequence by detecting moving objects (or differ- 
ence regions which may include both object and uncov- 
ered background) in each frame and separating them 
from the stationary fc>ackground. The shape, content 
and motion of these objects can tiien be effkaently 
coded using motion compensation and ttie differences, 
if any. using DCT or wavelets. When this compressed 
data is subjected to channel errors, the decoder loses 
synchronization with the encoder, which manifests itseH 
in a catastrophic loss of picture quality. Therefore, to, 
enable the decoder to regain synchronizatioh. the pre^ 
ferred embodiment resynchrordzation words can be 
inserted into the bitstream. These resynchronization 
words are introduced at ^e start of the data for an 1 
frame and at the start of each the codes for the foltowing 
items for every detected nraving object in a P frame in 
addition to the start of the P frame: 



(0 the boundary contour data (bitmap or si;riine); 
(n) the motion vector data; and 
Oii) the DCT data for the motion faOure regions. 

Further, if control data or other data is also 
irx;luded. then this data can also have resyndirontza- 
tion words. The resynchronization worcte are character- 
ized by the fact that th^ are unique; i.e. they are 
different from any given sequertce of coded bits of the 
10 same lertgth t)ecause they are not m the Huffman code 
table which is a static table. For example, if a P frame 
had three moving objects, then the sequence wouU 
look like: 
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35 



(i) franr>e begin resynchronization word 

pi) contour resynchronization word 

(iii) first dbject^s contour data (e.g., bitmap or spline) 

fiv) motion vector resynchronization word 

(v) first objects nx>tion vectors (related to bitmap 
macroblocks) 

(vi) DCTAvavelet resynchronization word 

(vii) first object's motion failiffe data 

(viii) contour resynchronization word 

(ix) second ot3ject*s contour data 

(x) motion vector resynchronization word 
(xO second dbjecf s motion vectors 

(xii) DCT/wavelet resynchronization word 

(xiii) second object s m<^ failure data 

(xiv) contour resynchronizaticm word 

(xv) third objects contour data 

(xvi) nration vector resynchronization wrd 
(xvB) third ot^ect*6 motion vectors data 
(xviii) DCT/wavelet resynchronization word 
(t\) third object's motion failure data 



These resynchronization worcte also help the 
decode in detecting errors. 

Once the decoder detects an error in the received 
bitstream, it tries to find the nearest resynchronization 
40 word. This the decoder reestak)lishes synchronization 
at the earilest possible time with a minimal loss of coded 
data. 

An enror may be detected at the decoder H any of 
tiie following conditions is observed: 

45 

(i) an invalid codeword is found; 

(ii) an invalid mode is detected whie decodhg; 

(iii) the resynchronization word does not foliow a 
decoded block of data; 

so (tv) a rru34ion vector points outskle of the frame; 

^ (v) a decoded DCT value lies outside of penmisstble 
Imits; or 

(vi) the txujndary contour is invalid (lies outskie of 
tiie image). 

55 

If an error is detected in the bourxiary contour data, 
then tiie contour is dropped and is made a part of the 
t>ackgrourKl; this means the corresponding region of the 
previous frame Is used. This reduces some distortion 
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because thereoften rs a lot of temporal correlation in the 
vidTO sequence. 

If an error rs detected in the motion vector data, 
then the average motion vector for the object is applied 
to the errtire object rather than each macroblock using 
its om nnotion vector. This relies on the fact that there is 
large spatial correlation in a given frame; therefore, 
most of the motion vectors of a given object are approoc- 
imately the sama Thus the avoage nnotton vector 
applied to the varioi^ maaoblocks of the object will be 
a good approximation and htip reduce visual distortion 
significarTtty. 

If an error Is detected In the motion failure region 
OCT data, then an of the DCT coefficients are set to 
zero and the decoder attempts to r^ynchronize. 

ERROR CORRECTION 

The error control code of the preferred 6mt>odi- 
merits comprises two Reed-Solomon (RS) coders with 
an interleaver in between as Dlustrated in Figure 15a. 
The brtstream to be transmitted is partitioned into 
groups of 6 successive bits to form the symbols for the 
RS coders. This wfll apply generally to transrrtission 
over a channel with burst errors In addition to random 
errors. The interleaver mixes up the symbols from sev- 
eral codewords so that the symbols from any given 
codeword are well s^^arated during transmission. 
When the codewords are reconstructed by the deinter- 
. leaver in the receiver, error bursts introduced by the 
channel are effectively broken up and spread across 
several codewords. The interteaver-deinterleaver pair 
thus transforms burst errors in to effectively random 
enrors. The delay multiplier m is chosen so that the over- 
all delay is less than 250 msec. 

Each of the RS coders iJBe& an RS code ever the 
Qalois field GF(64) and maps a block 6-btt infonnation 
synnbols into a larger block of 6-brt codeword symbols. 
The first RS coder codes an input block of k 6-bit infor- 
mation symbols as Oz 6-bit symbols and feeds these to 
thQ interleaver. and the second RS coder takes the out- 
put of the interleaver and maps the n2 6-bit synitx>ls into 
n^ 6-bit codeword symtxsls; n^ - ng o4. 

At the receiver, each block of n^ 6-bit symkx>ls is fed 
to a decoder for the second coder. This RS decoder, 
though capable of correcting up to 2 6-bit symbol entxs. 
is set to correct single errors only. When it detects any 
higher nunfi>er of errors, it outputs n2 erased symbols. 
The deinterleaver spreads these erasures over riz code- 
words which are then input to the decoder for the first 
RS coder. This decoder can con^ any combination of 
E errors and S erasures such that 2E+S ng-k . If 
2E-i'S is greater than the above nunt>er. then the data is 
output as is and the erasures in the data, if any. are 
noted by the decoder. 

The performance of the preferred embodinrmnt 
error-correcting exceeds the simple correction so far 
desaibed by further adding a feedback from the second 
decoder (after the deirrterleaver) to the first decoder and 



thereby improve the error conrection of the first decoder. 
In particular, assume that the first decoder correct E 
enrors and detects (and erases) T errors. Also presume 
the second decoder can correct S erasures in any given 
5 block of N2 symbols. Further, assume that at time t the 
first decoder detects X errors in the input block B which 
consists of 6-bft syrvbdis with X > E: implies a decod- 
ing failure at time t This decoding failure results In the 
first decoder outputting N2 erased symbols. The pre- 
70 fenced embodiment error correction system as Illus- 
trated in Rgure 15b includes a buffer to store the input 
blodc B of Nl^ symbols ard the time t at which the decod- 
ing failure occunred; this will be used in the feedback 
described below. The deinterleaver takes the N2 erased 
15 symbol t^ock output of the first decoder and spreads out 
the erased syntels over the next btocks: one erased 
symbol per block. Thus the erased symbols fr<»n block 
B appear at the second decoder at tames t, t4d, t+2d. ... 
t+(N2-1)d where d is the delay increment of ttie dein- 
20 terleaver and relates to the block length. 

Conskler the time t If the number of erased sym- 
bds in the input block to the second decoder at time t is 
less than or equal to S, then the second decoder can 
correct all the erasures in this input block. One of the 
25 corrected erasures derived from the input bSock B to the 
first decoder at time t. This corrected erasure can be 
either (1) one of the symbols of the irput block B which 
was an error detected by the first decoder or (2) was not 
one of the symbols in error in block B but was erased 
30 due to the decoding failure. 

Compare the corrected erasure with the contents of 
the corresponding location in block B which . has been 
stored in the buffer. If the con-ected erasure is the same 
as the conresponding contents of stored block B, then 
3S the corrected erased symt)ol was of category (2) and 
this output of the second decoder is used without any 
modification. However, if the corrected erased synrtol 
does not match the contents of the coresponcfing loca- 
tion in tAotK B, then tttis corresponding location symbol 
40 was one of the error symbols in bAotK B. Thus this error 
has been corrected k>y the second decoder and this cor- 
rection be made in t>lock B as stored in the buffer; 
that is, an origrrmlly uncorrectat^le error in block B for the 
first decoder has t>een corrected in the stored copy of 
45 block B by a feedback from the second decoder. This 
reduces tiie number of en-ors X tiiat would be detected 
by the first decoder if the thus corrected block B were 
again input to the first decoder. R^eat tiiis erasure cor- 
recting by the second decoder at later times i-M fi^ 1. 
so .„. (N2-I)) wNch corre^nd to tiie erasures derived 
from B; this may reduce the nunri>er of errors d^ectable 
in block B to X-Y. Once X-Y is less than E, all of ti)e 
remaining errors in the now corrected input block B can 
be con^ected, and the deinterleaver tr^ be updated 
55 wnth tiie thus corrected input block B. This reduces ttie 
number of erased symbols being passed to tiie second 
decoder at subsequent times, arKi tiiereby inaeasing 
ttie overall probability of error connection. Contrarily, if it 
is not possible to correct all of tiie enrors in tiie input 
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block B. then the conetidovs rrade by the second 
decoder are used without modification. Note that if an 
extension of the overall d^ay were tolerable, then the 
corrected block B could be r^nput to the first decode. 

Simulations show that the foregoing channel coding s 
is capable of correcting all burst lengths of duration less 
than 24 msec at transmisston rates of 24 Kbps and 48 
Kbps. 

In the case of random errors of probabtlity O.OOt for 
choices of (Kna.ni) equal to (24^,32). (26,30.34). io 
(27,31,34). and (28.32.36) me decoded bit enror rate 
was less than } 0.00000125. 0.000007, and 0.0000285, 
respectively with multiplier m»1. Simitarty, for ma2 
(38.43,48) may be used. Note that the overall delay 
depends upon tfie codeword ^ze due to the interleaver 75 
delays. In fact, the overall delay is 

delay - (mn 2) ^6A>itrate 

where the 6 comes from the use of 6-bit syn^s and so 
the second power from the number of synrMs in the 
codewords determines the number of delays and the 
increment between delays. Of course, the number of 
parity symbols (n^-rig and n2-k) used depends upon the 
bit error rate perfbrmairce desired arxJ the overall delay 25 

In our simulations with a bitstream of 3604480, 6-bit 
symbols, at a probabtlity of enor of 1e-3. the number of 
erasures without feedback is 46/3804480. 6-bit symbols 
{1.28e-5). With feecftjacK the number of erasures is 
24/3604480. 6-bit synribols (6.66e-6). For the combina- 30 
tion of burst error and random errors, number of eras- 
ures without feedback is 135/3604480 (3.75e-5) and 
with feedback the number of erasures is 118/2703360. 
6-bit symbols (3.27e-5). 

Ftgures 16a-b are heuristic examples Otustrating 35 
the feedt>ack error connection. In particular, the first row 
in Figure 16a shows a sequence of symbols 

A1.B1,A2,B2 which would be the information bit- 

stream to be transmitted, each symbol would be a group 
of successive bits. (e.g. 6 bits). For simplicity of illustra- 4o 
tion. tiie first coder is presumed to encode two infonna- 
tion symbols as a three symbol codeword; i.e., A1,B1, 
encodes as AI.BI.PI with PI being a parity symbol. 
This is analogous to the 26 information syrrtxHs 
encoded as 30 syntels with 4 parity symbols as in one 45 
of the foregoing preferred embodiments. 

The second row of Figure 16a ^ows the code- 
words. The interleaver spreads out the syrrttx>t8 by 
delays as ahown in the second and third rows of Figure 
16a. tn detail the Aj symbols have no delays, the Bjsynv so 
bds have delays of 3 symbols, and the Pj 8ynrttx)ls have 
delays of 6 symbols. The slanting anrows in Figure 16a 
indicate the delays. 

The Interleaver output (sequence of 3-symbol 
words) is encoded by the second encocfing as 4-symbol 55 
codewords. The fourth row of Rgure 16a iflustrates the 
second encoding of tiie 3-symbol words of the third row 
by adding a parity synnbol Qj to form a 4-syntel code- 
word. 



Row five of Figure 16a indicates three exemplary 
transnusston errors by way of the X% <n& the syntels 
A3.P1 , and K. Presume for sinTplidty tiiat the decoders 
can correct one enror per codeword or can detect two 
errors and erase the codeword symbols. The row 6 of 
Rgure 16a shows the decoding to correct the error in 
symbol 83 and Eros the A3. 82. Pi word as indicated 
by 0% over the symbols. 

The deinterieaver reassemkiles the 3-symt>ol code- 
words t>y delays which are complementary to the inter- 
leaver delays: the Aj symt>ols have delays of 6 symtx)ls. 
the Bj symt>ols have dela^ of 3-symbols and the Pj 
symbols have no delays. Rows 6-7 the delays witti 
slanting arrows. Note the erased symbols spread out bi 
the deint^eaving. 

Rgure 16a row 8 illustrates the secorxJ decoder 
correcting the erased symfcsols to recover the 
A1,B1,A2,B2 infbrmation. 

Figure 16b illustrates the same anrangement as 
Rgure 16a but with an additional error which can only 
be corrected by use of the preferred emtxxfiment feed- 
back to the deinterieaver. In particular, row 5 of Figure 
16b shows 6 enm depicted as Xls over the syntols 
A2. BI. A3. PI. B3. and A4. In this case the first 
decoder detects two enors in each of the corresponding 
codewords and erases all three enrc^ as illustrated by 

over the ^mbols in row 6 of Rgure lebi 

The detnt^eaver again reassembles the 3^ynM 
codewords by delays which are corrplemerrtary to ttie 
interleaver delays: rows 6-7 of Figure 16b show the 
delays with slanting arrows. The erased symbols again 
spread out. but ttiree erasures in codeword A2.B2.P2 
cannot be corrected. However, the codeword A1 . BI , PI 
witti BI and PI erased can be conrected by the second 
decoder to gpve the true codeword A1, BI , P1. Then ttie 
true BI can be compared to the word A2.B1,P0,Q2 in 
row 5 and tiie fact that BI diffm in this word implies that 
81 was one of the two errors in this word Thus the true 
BI can be used to form a word with or^ one remaining 
^ror (A2) and this word error corrected to ^ve the true 
A2, Bl.PO. This is the feedk>ack: a later error correction 
(BI in this example) is used to make an enor correction 
in a previously uncorrected word (which has already 
been decoded) and then this oonrection of the past also 
provides a con-ection of a symbol (/V2 in this example) 
for future use: the erased A2 being delayed in the inter- 
leaver can be conrected to true A2 and reduce the 
number of errors in the codeword A2. B2, P2 to two. 
Thus tiie codeword A2, B2. P2 can now t>e corrected. 
Thus the feedback from the A1, BI. PI correction to the 
A2. BI, PO, Q2 decoding led to ttie correction of A2 and 
then to the possS)le correction of ttie codeword A2. 62. 
P2. Of course, the numbers of symbols used and cor- 
rectable in these examples are heuristic and only for 
simple inustration. 

The pr^erred embodiments may be varied in many 
ways while retaining one or more of ttieir features. For 
example, ttie size of blocks, codes, ttiresholds. morphol- 
ogy neighborhoods, quantization levels, symbols, and 
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SO forth can be changed. Methods such as particular 
splines, quantization methods, transform methods, and 
so forth can be varied. 

Claims 

1 . An error correctingapparatus, comprising: 

(a) a first error-correcting decoder; 

(b) a detnterleaiver coupled to the output of said 
first decoder: 

(c) a second error-correcting decoder coupled 
to the output of said deinteileaver; 

(d) a buffer coupled to said first decoder; and 

(e) a feedback decoder coupled to said buffer 
and said second decoder and with output to 
said deinterteaver. said feedback decoder 
decoding codewords from said buffer with sub- 
stituted error connected symbols from said sec- 
ond decoder. 

2. The decoder of claim 1 , wherein: 

(a) said first decoder, said deinterleaver. said 
second decoder, and said feedback decoder 
are realized in a programmable digital signal 
processor. 

3. The decoder of claim 1 . wherein: 

(a) said first and second enx>r-correcting 
decoders use Reed-Solomon error correcting 
codes. 

4. A method of error correction decoding, conrprising 
the steps of: 

(a) providing a first sequence of possibly-error- 
containing codewords of the form made by the 
steps of (i) encoding an input sequence of 
information symbols to form a second of error 
correcting codewords, (ii) Interleaving symbols 
of codewords of said second sequence to form 
a third sequence of interleaved words, (iif) 
encoding said third sequence of interleaved 
words to form a fourth sequence of error cor- 
recting codewords, and (iv) introducirtg px>ssi- 
ble errors irtto said fourth sequence to form 
said f trst sequence; 

(b) decoding said first sequence with error cor- 
rection to form a fifth sequence of words; 

(c) detnterleace said fifth sequence to form a 
sixth sequence of codewords; 

(d) decoding said sixth sequence with error, 
correction to form a seventh sequence of 
words; 

(e) substituting a symbol of a word of said sev- 
enth sequence for the corresponcfing symbol of 
a word of said first sequence when said sym- 



bols drff&; 

(f) decoding saki words with suk>stituted sym- 
bols from the preceding step (e) with en-or cor- 
rection to form words with corrected symbols of 

s said fifth sequence; 

(g) using ones of said corrected symtx)ls of 
preceding step (f) in said deinterleaving of pre- 
ceding step (c). 

70 5. A method of motion compensation in an object-ori- 
ented video stream, comprising the steps of: 

(a) providing a frame wfth a single object; 

(b) replacing the background in said frame with 
IS a constant; 

(c) providing a second frame, said second 
frame following said frame: 

(d) for each block of pixels of said second frame 
and related to said object comparing said 

20 block with second blocks of pxels in the result 

of step (b); and 

(e) defining a nrwtion vector for said Uock by 
the conrparisons of said step (d). 

25 6- A method of motion compensation in an object-ori- 
ented vkieo stream, comprising the steps of: 

(a) providing a frame with objects 01. 02. ... 
On to be s^^arately encoded; 
30 (b) for each of said Objects Oj; 

(i) form an image with said object Qj recon- 
structed from a preceding frame and with 
the pixels outside of said reconstructed Oj 
35 set equal to an average of the background 

pixel values; 

(iQ for each block of pixels of saki object Oj. 
comparing sakl block with blocks of pixtis 
in said image formed in st^ ft): and 
^ fiii) define a motion vector for sakf block by 

the comparisons of said st^ (II)- 

7. A method of subband transforming, comprising the 
steps of; 

45 

(a) provk&ng an image, said image containing a 
region of interest; 

(b) setting the pixels of said image outskie of 
said regk>n of interest to a constant value; and 

so (c) af^lying a suktarxl transform to the result 

of step (b). 

8. A method of descrying a boundary of a region in an 
image, comprising the steps of: 

55 

(a) providing an image as M rows by N colunrms 
of ^els. said image containing a region; 

(b) tiling said region with m rows by n columns 
of k-by-k t^ocks of pixels, with k at least 2; 
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(c) tabeling each of said blocks as in said 
region when at tea^ tk? pixels of said block are 
in said region (indudtng on the boundary of 
said region), s£ud nrtilttplier t is a positive 
number between 0 and 1: and 

(d) describing the boundary of said region by 
said labeling of said blocks. 

9. The method of claim 8, wherein: 

(a) said daim 8 st^ (b) of tiling includes the 
st^ of: (i) folding the minimal-size rectangle 
with sides parallel the rows and columns and 
which covers said regions, (if) defining said 
blocks with one side of said rectangle coinctd- 
tr^g with a side of at least one of said t3k>cks and 
vwth a seoorvJ side of said rectar^le coindcfing 
with a side os at least one of said blocks with 
said second side perperxiicular to said one 
side and wherein each of said kjlocks contains 
at least one pixel said rectangle. 

1 Ol The nrtethod of daim 9, wherein: 

(a) said claim 8 step (d) of describing indudes 2S 
(i) IcK^ating the int&section of said one side and 
saki second side of said rectangle and 00 a bit 
map of said blocks con-espondlng to sakl labe- 
ling of daim 8 step (c). 

1 1. A method of desatoing a boundary of a region in a 
sequence of tnnages. comprising the steps of: 
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(a) providing first aixi second images with each 

of said images as M rows by N cduntns of pix- 3S 
els, said second image containing a region; 

(b) tiling said region with m rows by n columns 
of k-by-k blocks of pixels, vwth katlea^ 2, by (i) 
finding the minimal-size rectangle with sides 
parallel the rows and columns arxJ which cov- 4o 
ers said regton, (II) defining said blocks with 
one side of s^d rectangle coinciding with a 
side of at least one of said blocks and with a 
second side of said rectangle cdnddSng with a 
side OS at least one of said bk>cks with said 45 
second side perpendicular to said one skie and 
wherein each of said blocks conteins at least 
one pixel of said rectangle 

(c) defining a bit map of said bk>ck8 by a block 

is a 1 when at least tk^ pixels of saki block are so 
m said region (including on the boundary of 
said region) and a 0 c^erwise, sakl multiplier t 
is a positive number t>etween0 and 1;and 

(d) locating the intersection of said one side 
and saki second side of sakJ rectangle; and ss 

(e) describing the boundary of said region by 
said intersection k)catkai and sakJ bit mapi 
wherein said describing indudes differences 
from said first image. 
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